62 research outputs found

    Multi-Phase Task-Based HPC Applications: Quickly Learning how to Run Fast

    Get PDF
    International audienceParallel applications performance strongly depends on the number of resources. Although adding new nodes usually reduces execution time, excessive amounts are often detrimental as they incur substantial communication overhead, which is difficult to anticipate. Characteristics like network contention, data distribution methods, synchronizations, and how communications and computations overlap generally impact the performance. Finding the correct number of resources can thus be particularly tricky for multi-phase applications as each phase may have very different needs, and the popularization of hybrid (CPU+GPU) machines and heterogeneous partitions makes it even more difficult. In this paper, we study and propose, in the context of a task-based GeoStatistic application, strategies for the application to actively learn and adapt to the best set of heterogeneous nodes it has access to. We propose strategies that use the Gaussian Process method with trends, bound mechanisms for reducing the search space, and heterogeneous behavior modeling. We compare these methods with traditional exploration strategies in 16 different machines scenarios. In the end, the proposed strategies are able to gain up to ≈51% compared to the standard case of using all the nodes while having low overhead

    Communication-Aware Load Balancing of the LU Factorization over Heterogeneous Clusters

    Get PDF
    International audienceLarge clusters and supercomputers are rapidly evolving and may be subject to regular hardware updates that increase the chances of becoming heterogeneous. Homogeneous clusters may also have variable performance capabilities due to processor manufacturing, or even partitions equipped with different types of accelerators. Data distribution over heterogeneous nodes is very challenging but essential to exploit all resources efficiently. In this article, we build upon task-based runtimes' flexibility to study the interplay between static communication-aware data distribution strategies and dynamic scheduling of the linear algebra LU factorization over heterogeneous sets of hybrid nodes. We propose two techniques derived from the state-of-the-art 1D×1D data distributions. First, to use fewer computing nodes towards the end to better match performance bounds and save computing power. Second, to carefully move a few blocks between nodes to optimize even further the load balancing among nodes. We also demonstrate how 1D×1D data distributions, tailored for heterogeneous nodes, can scale better with homogeneous clusters than classical block-cyclic distributions. Validation is carried out both in real and in simulated environments under homogeneous and heterogeneous platforms, demonstrating compelling performance improvements

    A Trace Macroscopic Description based on Time Aggregation

    Get PDF
    Trace visualization; trace analysis; trace overview; time aggregation; parallel systems; embedded systems; information theory; scientific computation; multimedia application; debugging; optimizationToday, because of computing system complexity, it is required to trace application executions to understand their behavior. Visualization techniques provide some help in representing their content, but their scalability is limited both because of human perception and bounded screen resolution. To solve this issue, we propose a visualization based on time aggregation that provides a concise overview of a trace whatever its size. The level of details in this visualization can be configurable by users who can adjust the compromise between concision (gain from aggregation) and information loss. They can then refine their analysis by zooming in an interesting part and choosing a less aggregated overview for this interesting part. This visualization is implemented in our tool, Ocelotl, which enables users to interact with this visualization by changing the selected time interval and its aggregation settings dynamically. The results presented in this paper show that the technique can help users correctly identify anomalies in very large trace files composed of up to forty million events.De nos jours, à cause de la complexité des systèmes actuels, les analystes utilisent le traçage pour comprendre le comportement des programmes. Les techniques de visualisation aident à représenter le contenu de ces traces, mais le passage à l'échelle est limité par la perception humaine des données affichées ainsi que par la résolution des écrans. Dans le but de résoudre ce problème, nous proposons une technique de visualisation faisant appel à une algorithme d'agrégation, fournissant un aperçu du contenu de la trace quelle que soit sa taille. Le niveau de détail peut être ajusté par l'utilisateur, grâce à un compromis entre la réduction de complexité de la représentation (gain dû à l'agrégation) et la perte d'information. L'utilisateur peut ensuite raffiner l'analyse en zoomant sur des parties intéressantes de la trace et en diminuant l'intensité de l'agrégation. Cette technique est implémentée dans notre outil, Ocelotl, qui permet à l'utilisateur d'interagir avec la visualisation en changeant les bornes de temps et les paramètres de l'agrégation de manière dynamique. Les résultats présentés dans ce rapport montrent que notre contribution aide les utilisateurs à identifier des anomalies dans des traces contenant jusqu'à quarante millions d'événements

    Performance Analysis of Irregular Task-Based Applications on Hybrid Platforms: Structure Matters

    Get PDF
    International audienceEfficiently exploiting computational resources in heterogeneous platforms is a real challenge which has motivated the adoption of the task-based programming paradigm where resource usage is dynamic and adaptive. Unfortunately, classical performance visualization techniques used in routine performance analysis often fail to provide any insight in this new context, especially when the application structure is irregular. In this paper, we propose several performance visualization techniques and modeling strategies motivated by the analysis of task-based multifrontal sparse linear solvers whose structure is particularly complex. We show that by building on both a performance model of irregular tasks and on structure of the application (in particular the elimination tree), we can detect and highlight anomalies and understand resource utilization from the application point-of-view in a very insightful way. We validate these novel performance analysis techniques with the QR_mumps sparse parallel solver by describing a series of case studies where we identify and address non trivial performance issues thanks to our visualization methodology

    Ocelotl: Large Trace Overviews Based on Multidimensional Data Aggregation

    No full text
    International audiencePerformance analysis of parallel applications is commonly based on execution traces that might be investigated through visualization techniques. The weak scalability of such techniques appears when traces get larger both in time (many events registered) and space (many processing elements), a very common situation for current large-scale HPC applications. In this paper we present an approach to tackle such scenarios in order to give a correct overview of the behavior registered in very large traces. Two configurable and controlled aggregation-based techniques are presented: one based exclusively on the temporal aggregation, and another that consists in a spatiotemporal aggregation algorithm. The paper also details the implementation and evaluation of these techniques in Ocelotl, a performance analysis and visualization tool that overcomes the current graphical and interpretation limitations by providing a concise overview registered on traces. The experimental results show that Ocelotl helps in detecting quickly and accurately anomalies in 8 GB traces containing up to two hundred million of events

    Interactive Analysis of Large Distributed Systems with Topology-based Visualization

    Get PDF
    The performance of parallel and distributed applications is highly dependent on the characteristics of the execution environment. In such environments, the network topology and characteristics directly impact data locality and movements as well as contention, which are key phenomena to understand the behavior of such applications and possibly improve it. Unfortunately few visualization available to the analyst are capable of accounting for such phenomena. In this paper, we propose an interactive topology-based visualization technique based on data aggregation that enables to correlate network characteristics, such as bandwidth and topology, with application performance traces. We claim that such kind of visualization enables to explore and understand non trivial behavior that are impossible to grasp with classical visualization techniques. We also claim that the combination of multi-scale aggregation and dynamic graph layout allows our visualization technique to scale seamlessly to large distributed systems. We support these claims through a detailed analysis of a high performance computing scenario and of a grid computing scenario.Les performances des applications parallèles et distribuées dépendent fortement des caractéristiques de l'environnement d'exécution. Dans de tels environnements, la topologie du réseau et ses caractéristiques ont un impact direct sur la localité et les mouvements des données ainsi que sur la contention, qui sont des phénomènes clés pour comprendre le comportement de ces applications et éventuellement les améliorer. Malheureusement, peu de visualisation permettent de mettre en évidence ces phénomènes. Dans cet article, nous proposons une technique de visualisation interactive et topologique basée sur l'agrégation de données qui permet de corréler les caractéristiques du réseau, telles que la bande passante et la topologie, avec des traces de performances des applications. Ce type de visualisation permet d'explorer et de comprendre des comportements non triviaux qui sont impossibles à appréhender avec les techniques de visualisation classiques. Nous affirmons également que la combinaison de l'agrégation multi-échelle et l'agencement dynamique du graphe permet à notre technique de visualisation de passer à l'échelle. Nous étayons ces affirmations par l'analyse détaillée d'un scénario de calcul haute performance et d'un scénario de grid computing

    A Spatiotemporal Data Aggregation Technique for Performance Analysis of Large-scale Execution Traces

    No full text
    International audienceAnalysts commonly use execution traces collected at runtime to understand the behavior of an application running on distributed and parallel systems. These traces are inspected post mortem using various visualization techniques that, however, do not scale properly for a large number of events. This issue, mainly due to human perception limitations, is also the result of bounded screen resolutions preventing the proper drawing of many graphical objects. This paper proposes a new visualization technique overcoming such limitations by providing a concise overview of the trace behavior as the result of a spatiotemporal data aggregation process. The experimental results show that this approach can help the quick and accurate detection of anomalies in traces containing up to two hundred million events

    StarVZ: Performance Analysis of Task-Based Parallel Applications

    Get PDF
    High-performance computing (HPC) applications enable the solution of compute-intensive problems in feasible time. Among many HPC paradigms, task-based programming has gathered community attention in recent years. This paradigm enables constructing an HPC application using a more declarative approach, structuring it in a direct acyclic graph (DAG). The performance evaluation of these applications is as hard as in any other programming paradigm. Understanding how to analyze these applications, employing the DAG and runtime metrics, presents opportunities to improve its performance. This article describes the StarVZ R-package available on CRAN for performance analysis of task-based applications. StarVZ enables transforms runtime trace data into different vi-sualizations of the application behavior. An analyst can understand their applications' performance limitations and compare multiple executions. StarVZ has been successfully applied to several study-cases, showing its applicability in a number of scenarios

    Measuring phenology uncertainty with large scale image processing

    Get PDF
    International audienceOne standard method to capture data for phenological studies is with digital cameras, taking periodic pictures of vegetation. The large volume of digital images introduces the opportunity to enrich these studies by incorporating big data techniques. The new challenges, then, are to efficiently process large datasets and produce insightful information by controlling noise and variability. On these grounds, the contributions of this paper are the following. (a) A histogram-based visualization for large scale phenological data. (b) Phenological metrics based on the HSV color space, that enhance such histogram-based visualization. (c) A mathematical model to tackle the natural variability and uncertainty of phenological images. (d) The implementation of a parallel workflow to process a large amount of collected data efficiently. We validate these contributions with datasets taken from the Phenological Eyes Network (PEN), demonstrating the effectiveness of our approach. The experiments presented here are reproducible with the provided companion materia

    Detecção de Anomalias de Desempenho em Aplicações de Alto Desempenho baseadas em Tarefas em Clusters Híbridos

    Get PDF
    National audienceProgramming paradigms in High-Performance Computing have been shifting towards task-based models which are capable to more readily adapt to heterogeneous and scalable supercomputers. Detecting performance anomalies in such environments is particularly difficult since it must consider architecture heterogeneity, variability, and the capability to obtain trusted measurements. This work presents a case-study about the detection of anomalies in the execution of the well-known tiled dense Cholesky factorization developed with StarPU. Our experiments have been conducted in a variety of hybrid multi-node platforms to demonstrate how we are capable to detect and highlight performance anomalies.Os paradigmas de programação em Computação de Alto Desempe-nho estão mudando para modelos baseados em tarefas que são capazes de se adaptar a supercomputadores com arquiteturas heterogêneas e escaláveis. A detecção de anomalias de desempenho em tal cenário é particularmente difícil uma vez que ela deve considerar a heterogeneidade da arquitetura, a variabili-dade e a capacidade de obter medições confiáveis. Este trabalho apresenta um estudo de caso sobre a detecção de anomalias na execução da conhecida fatora-ção de Cholesky por blocos desenvolvida com StarPU. Os experimentos foram conduzidos em uma variedade de plataformas com múltiplos nós híbridos para demonstrar a capacidade de detectar e destacar anomalias de desempenho
    corecore